132 research outputs found

    A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

    Get PDF
    Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l= 128 ). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l , and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis- )advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures. Keywords: Fuzzy hashing, similarity preserving hash function, similarity digests, Hamming distance, computer forensics

    Towards automated incident handling: how to select an appropriate response against a network-based attack?

    Get PDF
    The increasing amount of network-based attacks evolved to one of the top concerns responsible for network infrastructure and service outages. In order to counteract these threats, computer networks are monitored to detect malicious traffic and initiate suitable reactions. However, initiating a suitable reaction is a process of selecting an appropriate response related to the identified network-based attack. The process of selecting a response requires to take into account the economics of an reaction e.g., risks and benefits. The literature describes several response selection models, but they are not widely adopted. In addition, these models and their evaluation are often not reproducible due to closed testing data. In this paper, we introduce a new response selection model, called REASSESS, that allows to mitigate network-based attacks by incorporating an intuitive response selection process that evaluates negative and positive impacts associated with each countermeasure. We compare REASSESS with the response selection models of IE-IRS, ADEPTS, CS-IRS, and TVA and show that REASSESS is able to select the most appropriate response to an attack in consideration of the positive and negative impacts and thus reduces the effects caused by an network-based attack. Further, we show that REASSESS is aligned to the NIST incident life cycle. We expect REASSESS to help organizations to select the most appropriate response measure against a detected network-based attack, and hence contribute to mitigate them

    On the Database Lookup Problem of Approximate Matching

    Get PDF
    Investigating seized devices within digital forensics gets more and more difficult due to the increasing amount of data. Hence, a common procedure uses automated file identification which reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is also helpful to detect similar data by applying approximate matching. Let x denote the number of digests in a database, then the lookup for a single similarity digest has the complexity of O(x). In other words, the digest has to be compared against all digests in the database. In contrast, cryptographic hash values are stored within binary trees or hash tables and hence the lookup complexity of a single digest isO(log2(x)) or O(1), respectively. In this paper we present and evaluate a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(x) to O(1). Therefore, instead of using multiple small Bloom filters (which is the common procedure), we demonstrate that a single, huge Bloom filter has a far better performance. Our evaluation demonstrates that current approximate matching algorithms are too slow (e.g., over 21 min to compare 4457 digests of a common file corpus against each other) while the improved version solves this challenge within seconds. Studying the precision and recall rates shows that our approach works as reliably as the original implementations. We obtain this benefit by accuracy–the comparison is now a file-against-set comparison and thus it is not possible to see which file in the database is matched

    An Efficient Similarity Digests Database Lookup -- a Logarithmic Divide and Conquer Approach

    Get PDF
    Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 220-fold acceleration of look-up costs

    An Efficient Similarity Digests Database Lookup – A Logarithmic Divide & Conquer Approach

    Get PDF
    Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 2 20-fold acceleration of look-up costs

    On Efficiency of Artifact Lookup Strategies in Digital Forensics

    Get PDF
    In recent years different strategies have been proposed to handle the problem of ever-growing digital forensic databases. One concept to deal with this data overload is data reduction, which essentially means to separate the wheat from the chaff, e.g., to filter in forensically relevant data. A prominent technique in the context of data reduction are hash-based solutions. Data reduction is achieved because hash values (of possibly large data input) are much smaller than the original input. Today\u27s approaches of storing hash-based data fragments reach from large scale multithreaded databases to simple Bloom filter representations. One main focus was put on the field of approximate matching, where sorting is a problem due to the fuzzy nature of the approximate hashes. A crucial step during digital forensic analysis is to achieve fast query times during lookup (e.g., against a blacklist), especially in the scope of small or ordinary resource availability. However, a comparison of different database and lookup approaches is considerably hard, as most techniques partially differ in considered use-case and integrated features, respectively. In this work we discuss, reassess and extend three widespread lookup strategies suitable for storing hash-based fragments: (1) Hash database for hash-based carving (hashdb), (2) hierarchical Bloom filter trees (hbft) and (3) flat hash maps (fhmap). We outline the capabilities of the different approaches, integrate new extensions, discuss possible features and perform a detailed evaluation with a special focus on runtime efficiency. Our results reveal major advantages for fhmap in case of runtime performance and applicability. Hbft showed a comparable runtime efficiency in case of lookups, but hbft suffers from pitfalls with respect to extensibility and maintenance. Finally, hashdb performs worst in case of a single core environment in all evaluation scenarios. However, hashdb is the only candidate which offers full parallelization capabilities, transactional features, and a Single-level storage

    FRASHER – A framework for automated evaluation of similarity hashing

    Get PDF
    A challenge for digital forensic investigations is dealing with large amounts of data that need to be processed. Approximate matching (AM), a.k.a. similarity hashing or fuzzy hashing, plays a pivotal role in solving this challenge. Many algorithms have been proposed over the years such as ssdeep, sdhash, MRSH-v2, or TLSH, which can be used for similarity assessment, clustering of different artifacts, or finding fragments and embedded objects. To assess the differences between these implementations (e.g., in terms of runtime efficiency, fragment detection, or resistance against obfuscation attacks), a testing framework is indispensable and the core of this article. The proposed framework is called FRASHER (referring to a predecessor FRASH from 2013) and provides an up-to-date view on the problem of evaluating AM algorithms with respect to both the conceptual and the practical aspects. Consequently, we present and discuss relevant test case scenarios as well as release and demonstrate our framework allowing a comprehensive evaluation of AM algorithms. Compared to its predecessor, we adapt it to a modern environment providing better modularity and usability as well as more thorough testing cases

    Distributed DDoS Defense:A collaborative Approach at Internet Scale

    Get PDF
    Distributed large-scale cyber attacks targeting the availability of computing and network resources still remain a serious threat. To limit the effects caused by those attacks and to provide a proactive defense, mitigation should move to the networks of Internet Service Providers (ISPs). In this context, this thesis focuses on a development of a collaborative, automated approach to mitigate the effects of Distributed Denial of Service (DDoS) attacks at Internet Scale. This thesis has the following contributions: i) a systematic and multifaceted study on mitigation of large-scale cyber attacks at ISPs. ii) A detailed guidance selecting an exchange format and protocol suitable to use to disseminate threat information. iii) To overcome the shortcomings of missing flow-based interoperability of current exchange formats, a development of the exchange format Flow-based Event Exchange Format (FLEX). iv) A communication process to facilitate the automated defense in response to ongoing network-based attacks, v) a model to select and perform a semi-automatic deployment of suitable response actions. vi) An investigation of the effectiveness of the defense techniques moving-target using Software Defined Networking (SDN) and their applicability in context of large-scale cyber attacks and the networks of ISPs. Finally, a trust model that determines a trust and a knowledge level of a security event to deploy semi-automated remediations and facilitate the dissemination of security event information using the exchange format FLEX in context of ISP networks
    corecore